18 research outputs found
Support Vector Machines for Speech Recognition
Hidden Markov models (HMM) with Gaussian mixture observation densities are the dominant approach in speech recognition. These systems typically use a representational model for acoustic modeling which can often be prone to overfitting and does not translate to improved discrimination. We propose a new paradigm centered on principles of structural risk minimization using a discriminative framework for speech recognition based on support vector machines (SVMs). SVMs have the ability to simultaneously optimize the representational and discriminative ability of the acoustic classifiers. We have developed the first SVM-based large vocabulary speech recognition system that improves performance over traditional HMM-based systems. This hybrid system achieves a state-of-the-art word error rate of 10.6% on a continuous alphadigit task ? a 10% improvement relative to an HMM system. On SWITCHBOARD, a large vocabulary task, the system improves performance over a traditional HMM system from 41.6% word error rate to 40.6%. This dissertation discusses several practical issues that arise when SVMs are incorporated into the hybrid system
Implementing contextual biasing in GPU decoder for online ASR
GPU decoding significantly accelerates the output of ASR predictions. While
GPUs are already being used for online ASR decoding, post-processing and
rescoring on GPUs have not been properly investigated yet. Rescoring with
available contextual information can considerably improve ASR predictions.
Previous studies have proven the viability of lattice rescoring in decoding and
biasing language model (LM) weights in offline and online CPU scenarios. In
real-time GPU decoding, partial recognition hypotheses are produced without
lattice generation, which makes the implementation of biasing more complex. The
paper proposes and describes an approach to integrate contextual biasing in
real-time GPU decoding while exploiting the standard Kaldi GPU decoder. Besides
the biasing of partial ASR predictions, our approach also permits dynamic
context switching allowing a flexible rescoring per each speech segment
directly on GPU. The code is publicly released and tested with open-sourced
test sets.Comment: Accepted to Interspeech 202
Risk Minimization Approaches in Signal Processing
by, Statistical techniques based on Hidden Markov models (HMMs) with Gaussian emission densities have dominated the signal processing and pattern recognition literature for the past 20 years. However, HMMs suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. SVMs have been shown to provide significant improvements in performance on small pattern recognition tasks compared to a number of conventional approaches. SVMs, however, require ad hoc (and unreliable) methods to couple it to probabilistic learning machines. Probabilistic Bayesian learning machines, such as the relevance vector machine (RVM), are fairly new approaches that attempt to overcome the deficiencies of SVMs by explicitly accounting for sparsity and statistics in their formulation. In the proposed paper, we will review the past 30 years of research into these new learning machines, and describe how they can be used to solve many traditional signal processing problems. Unifying themes in this work are the concepts of risk minimization and margin maximization, which can be viewed as a generalization of the maximum likelihood principle so fundamental to many signal processing approaches. It is our belief that this information has not been previously explained in a way that makes it accessible to mainstream signal processing researchers, so we believe this paper will have significant tutorial value